Economic Benefits of Getting a Degree

Data Bootcamp Final Project
Yuxin Xia (yx784@nyu.edu)
Dec.15th 2016

Overview:
There are many reasons for people going to colleges, such as obtaining knowledge, meeting new people and expanding social networks. Within these benefits, getting a diploma is an attractive one. In most cases, people believe that diplomas will benefit their career in the future and help them earn more.
In this project, I will try to quantity the economic benefits of getting a degree(including bachelor, master and doctor degrees) with two main data availabe online: percentage of workers with different degrees and the weekly salary of each occupation. Compared with the tuition people paid to get a degree, this project can help people see if their college education is worthwhile from an economic perspective of view.

Related Website:



In [1]:

    
import sys                             # system module
import pandas as pd                    # data package
import matplotlib as mpl               # graphics package
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly 
plotly.offline.init_notebook_mode() # run at the start of every ipython notebook

General Picture

First, I am going to show a general picture of the annual wage for people with different education levels. The table I got contains the median annual wage for people of different degrees.

Data Sourse:
title of the file: Employment, wages, and projected change in employment by typical entry-level education (Employment in thousands)
go to page and click on “Other available formats: (XLSX)” on the top to get the file downloaded to the laptop.
I uploaded the file to Github.



In [2]:

    
#read the file from github and only take the sheet with name "summary"
data_general = pd.read_excel("https://github.com/yuxinXia96/final-project/raw/master/education.xlsx",\
                             sheetname = 'Summary')



In [3]:

    
#drop rows with missing data
data_general= data_general.dropna()
#keep columns: oppupation and annual wage, drop cloumns: number of worker, percentage distribution, employment change
data_general = data_general.drop(['Unnamed: 1','Unnamed: 2','Unnamed: 3'],axis = 1)
#rename the columns
data_general = data_general.rename(columns = {'Employment, wages, and projected change in employment by typical \
entry-level education (Employment in thousands)':'Education','Unnamed: 4':'Median annual wage(2015)'})



In [4]:

    
#get the medium annual wage within all the workers, add it as a column
data_general['Mean'] = data_general['Median annual wage(2015)'][2]
#drop the row with information about the total population
data_general = data_general.drop([2])



In [5]:

    
#the table looks like:
data_general









    Out[5]:






  
    
      
      Education
      Median annual wage(2015)
      Mean
    
  
  
    
      3
      Doctoral or professional degree
      100490
      36200
    
    
      4
      Master's degree
      66420
      36200
    
    
      5
      Bachelor's degree
      70400
      36200
    
    
      6
      Associate's degree
      50230
      36200
    
    
      7
      Postsecondary nondegree award
      35660
      36200
    
    
      8
      Some college, no degree
      33870
      36200
    
    
      9
      High school diploma or equivalent
      36210
      36200
    
    
      10
      No formal educational credential
      21420
      36200



In [6]:

    
#plot
general_benefit = dict(type="scatter",                                      
           name="median annual wage for different degrees",                                      
           x=data_general['Education'],                             
           y=data_general["Median annual wage(2015)"],                             
           marker={"color": "blue"},
           mode = 'markers-lines',
            )
general_mean = dict(type="scatter",                                      
           name="Average annual wage",                                      
           x=data_general['Education'],                             
           y=data_general["Mean"],                             
           marker={"color": "purple"},
           mode = 'lines',
            )
layout = dict(                      
              yaxis={"title": "annual wage"},                   
              title="Annual Wage for People of Different Degrees",         
    )
iplot(go.Figure(data=[general_benefit,general_mean],layout=layout))

Although there are some exceptions (exceptions may raise from decresed working experience), generally sepaking, higher education leads to higher annual wage. However, the benefits of getting a degree differ from occupation to occupation. I am going to show the economic benefits of a degree to some detailed occupations.

Economic Benefits Ordered by Detialed Occpations

It’s clear that the higher education a person has, the easier for him/her to find a particular job. I found a file with the percentage of educational attainment for workers ordered by detailed occupations. To quantify the easiness of finding a job, for each education level, I used the cumulative percentage of workers with that occupation.
For each occupation, I multiplied the percentage of the workers with the weekly salary of that occupation. Then I multiplied the result with 52(52 weeks in year)to get the annual income. The changes of annual income with different education levels can show the economic benefits brought by education.

Data Sources:

Median weekly earnings of full-time wage and salary workers by detailed occupation and sex (Numbers in thousands)
This table show the median weekly wage for more than 500 occupaions around the U.S.
Educational attainment for workers 25 years and older by detailed occupation, 2014–15
go to page and click on “Other available formats: (XLSX)” on the top to get the file downloaded to the laptop.
I uploaded the file to Github.
This file contains the education attainment for workers with more than 800 occupations.



In [7]:

    
#read the weekly earning file from the Internet. Notice that the type of the file is list.
wage = pd.read_html("http://www.bls.gov/cps/cpsaat39.htm") 
#data_wage[0] and data_wage[1] have similar data, so only keep one
wage = wage[1] 
#convert list into dataframe
data_wage = pd.DataFrame(wage,columns = ['Occupation','2015','Total'])
#see the size of the file
data_wage.shape









    Out[7]:





(570, 3)



In [8]:

    
#some adjustment to get the data I need -- occupation,number of workers, weekly salary

#total wage: remove $ sign
data_wage = data_wage.replace(to_replace='$809',value=809) 
#replace '-' with missing data NaN
data_wage = data_wage.replace('-',pd.np.nan)
#change the data type of 'Total'
data_wage['Total'] = data_wage['Total'].astype(float) 
#drop the rows with missing data
data_wage = data_wage.dropna()
#rename the column labels
data_wage = data_wage.rename(columns = {'2015':'number of workers','Total':'weekly salary'})
#see the first 5 rows of the table
data_wage.head(5)









    Out[8]:






  
    
      
      Occupation
      number of workers
      weekly salary
    
  
  
    
      0
      Total, full-time wage and salary workers
      109080.0
      809.0
    
    
      2
      Management, professional, and related occupations
      44844.0
      1158.0
    
    
      3
      Management, business, and financial operations...
      18422.0
      1258.0
    
    
      4
      Management occupations
      12480.0
      1351.0
    
    
      5
      Chief executives
      1046.0
      2041.0



In [9]:

    
#read education data from the file I have uploaded to the Internet, only read the sheet I need
data_education = pd.read_excel("https://github.com/yuxinXia96/final-project/raw/master/occupation.xlsx",sheetname=11)
#see the size of the table
data_education.shape









    Out[9]:





(823, 9)



In [10]:

    
#some adjustments to get the cumulative percentage of each degree for every occupation
#basically add the pertage of workers with education equal or lower than a certain degree
data_education['%high school']=data_education['Unnamed: 2']+data_education['Unnamed: 3']
data_education['%bachelor']=data_education['Unnamed: 2']+data_education['Unnamed: 3']+data_education['Unnamed: 4']\
    +data_education['Unnamed: 5']+data_education['Unnamed: 6']
data_education['%master']=data_education['Unnamed: 2']+data_education['Unnamed: 3']+data_education['Unnamed: 4']\
    +data_education['Unnamed: 5']+data_education['Unnamed: 6']+data_education['Unnamed: 7']
data_education['%doctoral']=data_education['Unnamed: 2']+data_education['Unnamed: 3']+data_education['Unnamed: 4']\
    +data_education['Unnamed: 5']+data_education['Unnamed: 6']+data_education['Unnamed: 7']+data_education['Unnamed: 8']
data_education = data_education.drop(['Unnamed: 1','Unnamed: 2','Unnamed: 3','Unnamed: 4','Unnamed: 5'],axis=1)
data_education = data_education.drop(['Unnamed: 6','Unnamed: 7','Unnamed: 8'],axis=1)
#treat 0 percentage of workers in a certain occupation as missing value
data_education = data_education.replace(0,pd.np.nan)
#get rid of the rows with missing value
data_education = data_education.dropna()
#drop the first row which is not an observation but some descriptions
data_education = data_education.drop([0])
#rename the column lables
data_education = data_education.rename(columns=
{'Table 1.11 Educational attainment for workers 25 years and older by detailed occupation, 2014-15':'Occupation'})
#get rid of all the (1) in the occupation list
data_education = data_education.replace('\(1\)','', regex=True)
#see the top 5 rows of the table
data_education.head()









    Out[10]:






  
    
      
      Occupation
      %high school
      %bachelor
      %master
      %doctoral
    
  
  
    
      1
      Total, all occupations
      32.9
      86.1
      95.8
      100.0
    
    
      2
      Chief executives
      10.7
      71.2
      92.7
      100.1
    
    
      3
      General and operations managers
      19.4
      87.6
      98.5
      100.0
    
    
      4
      Legislators
      10.7
      71.2
      92.7
      100.1
    
    
      5
      Advertising and promotions managers
      5.0
      85.8
      97.9
      99.9



In [11]:

    
#merge two tables(weekly wage and education)
benefits = pd.merge(data_wage, data_education,   # left and right df's
                 how='left',        # add to left 
                 on='Occupation'       # link with this variable
                )



In [12]:

    
#calculate the change in weekly salary
benefits['%added bachelor']=benefits['%bachelor']-benefits['%high school']
benefits['%added master']=benefits['%master']-benefits['%bachelor']
benefits['%added doctoral']=benefits['%doctoral']-benefits['%master']
#calulate the change in annual salary
benefits['B benefit']=benefits['%added bachelor']*benefits['weekly salary']*52
benefits['M benefit']=benefits['%added master']*benefits['weekly salary']*52
benefits['D benefit']=benefits['%added doctoral']*benefits['weekly salary']*52
#drop rows with missing value
benefits = benefits.dropna()



In [13]:

    
#plot
salary1 = dict(type="scatter",                                      
           name="economic benefit of a bachelor degeree",                                      
           x=benefits['Occupation'],                             
           y=benefits["B benefit"],                             
           marker={"color": "yellow"},
           mode = 'markers',
          # size = benefits['number of workers']
          #remove ticks on x axis
            )
salary2 = dict(type="scatter",                                      
           name="economic benefit of a master degree",                                      
           x=benefits['Occupation'],                             
           y=benefits["M benefit"],                             
           marker={"color": "orange"},
           mode = 'markers',
          # size = benefits['number of workers']
          #remove ticks on x axis
            )
salary3 = dict(type="scatter",                                      
           name="economic benefit of a doctorial degree",                                      
           x=benefits['Occupation'],                             
           y=benefits["D benefit"],                             
           marker={"color": "red"},
           mode = 'markers',
          # size = benefits['number of workers']
          #remove ticks on x axis
            )
#marker = dict(size = benefits['number of workers'])
layout = dict(                      
              yaxis={"title": "economic benefits(in $)"},                   
              title="Economic Benefits for Different Oppupations",         
    )

iplot(go.Figure(data=[salary1,salary2,salary3],layout=layout))

From the plot, it shows that Bachelar degree is most helpful in increasing salaries. Master and doctor degrees are more helpful to people with jobs in finance, research and education field(left half of the graph) than blue collars(right half of the graph). One interest point is the highest red dot. A doctor degree has huge impact on lawyers.

Economic Benefits of Different Degrees to Business People

As a Stern student, I am particularly interested in the economic benefits of different degrees to people working in business and finance field. As a result, I took out some occupations with words related to business in their titles.



In [14]:

    
#get a new table with occupations related to business, finance, operation and accounting
b = benefits[benefits.Occupation.str.contains('Financial')|benefits.Occupation.str.contains('operation')|\
         benefits.Occupation.str.contains('business')|benefits.Occupation.str.contains('account')|\
        benefits.Occupation.str.contains('financial')]
#see the data
b









    Out[14]:






  
    
      
      Occupation
      number of workers
      weekly salary
      %high school
      %bachelor
      %master
      %doctoral
      %added bachelor
      %added master
      %added doctoral
      B benefit
      M benefit
      D benefit
    
  
  
    
      5
      General and operations managers
      823.0
      1260.0
      19.4
      87.6
      98.5
      100.0
      68.2
      10.9
      1.5
      4468464.0
      714168.0
      98280.0
    
    
      11
      Financial managers
      1124.0
      1408.0
      11.1
      77.8
      97.5
      100.1
      66.7
      19.7
      2.6
      4883507.2
      1442355.2
      190361.6
    
    
      40
      Business operations specialists, all other
      186.0
      1090.0
      11.3
      75.7
      95.7
      100.0
      64.4
      20.0
      4.3
      3650192.0
      1133600.0
      243724.0
    
    
      42
      Financial analysts
      295.0
      1426.0
      2.3
      63.2
      96.2
      100.0
      60.9
      33.0
      3.8
      4515856.8
      2447016.0
      281777.6
    
    
      43
      Personal financial advisors
      407.0
      1419.0
      4.4
      73.2
      94.2
      99.9
      68.8
      21.0
      5.7
      5076614.4
      1549548.0
      420591.6
    
    
      48
      Financial specialists, all other
      66.0
      1162.0
      12.3
      77.9
      97.5
      100.1
      65.6
      19.6
      2.6
      3963814.4
      1184310.4
      157102.4
    
    
      203
      Securities, commodities, and financial service...
      211.0
      1155.0
      8.1
      79.6
      97.3
      99.9
      71.5
      17.7
      2.6
      4294290.0
      1063062.0
      156156.0
    
    
      211
      Bill and account collectors
      152.0
      657.0
      33.9
      97.3
      99.5
      100.0
      63.4
      2.2
      0.5
      2165997.6
      75160.8
      17082.0
    
    
      213
      Bookkeeping, accounting, and auditing clerks
      769.0
      692.0
      31.4
      96.4
      99.6
      100.1
      65.0
      3.2
      0.5
      2338960.0
      115148.8
      17992.0
    
    
      216
      Financial clerks, all other
      61.0
      767.0
      17.5
      85.1
      99.0
      100.0
      67.6
      13.9
      1.0
      2696158.4
      554387.6
      39884.0



In [15]:

    
#plot
s1 = dict(type="bar",                                      
           name="economic benefit of bachelor degeree",                                      
           x=b['Occupation'],                             
           y=b["B benefit"],                             
           marker={"color": "yellow"}
            )
s2 = dict(type="bar",                                      
           name="economic benefit of master degree",                                      
           x=b['Occupation'],                             
           y=b["M benefit"],                             
           marker={"color": "orange"},
            )
s3 = dict(type="bar",                                      
           name="economic benefit of doctorial degree",                                      
           x=b['Occupation'],                             
           y=b["D benefit"],                             
           marker={"color": "red"},
            )
#marker = dict(size = benefits['number of workers'])
layout = dict(                      
              yaxis={"title": "economic benefits (in $)"},                   
              title="Economic Benefits for Oppupations in Business and Finance Field",         
    )

iplot(go.Figure(data=[s1,s2,s3],layout=layout))

The picture shows that bachelor degrees are of great help to business people. Master degrees are helpful to people in finance field, but not that helpful to people who have the work related to accounting. Doctor degrees are helpful to people who work as analysts and specialists, but add less value compared with bachelor or master degrees.

Cost of Getting a Degree: Tuition

After examing the economic benefits of getting a degree, we now focus on the cost paid to get a degree. The cost we consider here is tuition. Unlike the payment for future jobs, people usually know exactly how much they need to pay for a certain school they are accepted. As a result, I am only going to show a general picture of college tuitions without seperating colleges according to their locations or major focuses. The annual tuition here is just a rough estimation of the cost.

Data Source:
The website leads to an online database. The important key words I chose are: use final release data; select institutions: by group—U.S. only; select variables: tuition and fee (2015-16). After selecting these key words, the database will provide a csv file with the data needed. I uploaded the file to Github.



In [16]:

    
data_tuition = pd.read_csv('https://raw.githubusercontent.com/yuxinXia96/final-project/master/CSV_1262016-477.csv')



In [17]:

    
#drop the columns with year info and ID info
data_tuition = data_tuition.drop(['unitid','year'],axis=1)
#drop rows with missing data
data_tuition = data_tuition.dropna()
#rename columns
data_tuition = data_tuition.rename(columns = {'institution name':'institution',
                                              'DRVIC2015.Tuition and fees, 2015-16':'tuition'})
#calculate average tuition
data_tuition['mean'] = data_tuition['tuition'].mean()



In [18]:

    
#plot
t = dict(type="scatter",                                      
           name="tuition 2015-16",                                      
           x=data_tuition['institution'],                             
           y=data_tuition["tuition"],                             
           marker={"color": "pink"},
           mode = 'markers',
            )
m = dict(type="scatter",                                      
           name="mean",                                      
           x=data_tuition['institution'],                             
           y=data_tuition["mean"],                             
           marker={"color": "brown"},
           mode = 'lines',
            )
layout = dict(                      
              yaxis={"title": "tuition(in $)"},                   
              title="Tuition 2015-16",         
    )
iplot(go.Figure(data=[t,m],layout=layout))

From the picture, it shows that the average tuition is around 15K per year, and tuition is almost within the range from 5K to 50K.

Recall the pictures we did previously, the benefits of barchalor degrees are mostly above 1.5M. Thus, going to a college is worthwhile even without considering the gain of knowledge and improvement of social networks. Also, the annual salary increase doesn’t consider the lifelong benefits from education. The increases in earnings will last much longer than the school years. So the actual economic benefits will be bigger than what we can see from the graph.



In [ ]:

	Education	Median annual wage(2015)	Mean
3	Doctoral or professional degree	100490	36200
4	Master's degree	66420	36200
5	Bachelor's degree	70400	36200
6	Associate's degree	50230	36200
7	Postsecondary nondegree award	35660	36200
8	Some college, no degree	33870	36200
9	High school diploma or equivalent	36210	36200
10	No formal educational credential	21420	36200

	Occupation	number of workers	weekly salary
0	Total, full-time wage and salary workers	109080.0	809.0
2	Management, professional, and related occupations	44844.0	1158.0
3	Management, business, and financial operations...	18422.0	1258.0
4	Management occupations	12480.0	1351.0
5	Chief executives	1046.0	2041.0

	Occupation	%high school	%bachelor	%master	%doctoral
1	Total, all occupations	32.9	86.1	95.8	100.0
2	Chief executives	10.7	71.2	92.7	100.1
3	General and operations managers	19.4	87.6	98.5	100.0
4	Legislators	10.7	71.2	92.7	100.1
5	Advertising and promotions managers	5.0	85.8	97.9	99.9

	Occupation	number of workers	weekly salary	%high school	%bachelor	%master	%doctoral	%added bachelor	%added master	%added doctoral	B benefit	M benefit	D benefit
5	General and operations managers	823.0	1260.0	19.4	87.6	98.5	100.0	68.2	10.9	1.5	4468464.0	714168.0	98280.0
11	Financial managers	1124.0	1408.0	11.1	77.8	97.5	100.1	66.7	19.7	2.6	4883507.2	1442355.2	190361.6
40	Business operations specialists, all other	186.0	1090.0	11.3	75.7	95.7	100.0	64.4	20.0	4.3	3650192.0	1133600.0	243724.0
42	Financial analysts	295.0	1426.0	2.3	63.2	96.2	100.0	60.9	33.0	3.8	4515856.8	2447016.0	281777.6
43	Personal financial advisors	407.0	1419.0	4.4	73.2	94.2	99.9	68.8	21.0	5.7	5076614.4	1549548.0	420591.6
48	Financial specialists, all other	66.0	1162.0	12.3	77.9	97.5	100.1	65.6	19.6	2.6	3963814.4	1184310.4	157102.4
203	Securities, commodities, and financial service...	211.0	1155.0	8.1	79.6	97.3	99.9	71.5	17.7	2.6	4294290.0	1063062.0	156156.0
211	Bill and account collectors	152.0	657.0	33.9	97.3	99.5	100.0	63.4	2.2	0.5	2165997.6	75160.8	17082.0
213	Bookkeeping, accounting, and auditing clerks	769.0	692.0	31.4	96.4	99.6	100.1	65.0	3.2	0.5	2338960.0	115148.8	17992.0
216	Financial clerks, all other	61.0	767.0	17.5	85.1	99.0	100.0	67.6	13.9	1.0	2696158.4	554387.6	39884.0